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In complex engineered systems, completing an objective is sometimes not enough. 
The system must be able to reach a set performance characteristic, such as an 
unmanned aerial vehicle flying from point A to point B, under 10 seconds. This 
introduces the notion of optimality, what is the most efficient, the fastest, the 
cheapest way to complete a task. This report explores the two pillars of optimal 
control, Bellman’s Dynamic Programming and Pontryagin’s Maximum Principle, 
and compares implementations of both theories onto simulated systems. Dynamic 
Programming is realized through a Differential Dynamic Programming Algorithm, 
where utilizes a forward-backward pass to iteratively optimize a control sequence and 
trajectory. The Maximum Principle is realized via Gauss Pseudospectral Optimal 
Control, where the optimal control problem is first approximated through polynomial 
basis functions, then solved, with optimality being achieved through the costate 
equations of the Maximum Principle. The results of the report show that, for 
short time Horizons, DDP can optimize quickly and can generate a trajectory that 
utilizes less control effort for the same problem formulation. On the other hand 
Pseudospectral methods can optimize faster for longer time horizons, but require a 
key understanding of the problem structure. Future work involves completing an 
implementation of DDP in a C+-1- code, and testing the speed of convergence for 
both methods, as well as extended the Pseudospectral Optimal Control framework 
in to the world of stochastic optimal control. 


1 Introduction 

As systems become more complex and tasks become more demanding, optimal control of 
multi-body systems becomes increasingly challenging. There are many methods built for various 
classes of optimal control problems, but this paper aims to compare two specific methods, 
namely Differential Dynamic Programming, and Gauss Pseudospectral Optimal Control in 
order to compare their respective ability to handle complex dynamic and algebraic constraints. 

The impact of this project can enter the realm of control with contact dynamics, a notoriously 
difficult problem with automation, as well as stochastic control. In order to discuss the aforementioned 
methods for optimal control, we first understand their roots in the two pillars of optimal control, 
Bellman’s Dynamic Programming, and Pontryagin’s Maximum Principle. 
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1.1 Dynamic Programming 

The basis of Dynamic Programming lies in discretizing the path from the initial state to the final 
state, then finding the optimal path for each discrete step. This approach will then compare the 
most optimal path through the entirety of the states, and compute a global optimum. Dynamic 
Programming provides a sufficient condition for an optimal trajectory. The primary weakness, 
however, is that it must compute the optimal trajectory for every possible discrete interval. As 
a result, it suffers from the curse of dimensionality as the number of states and the resolution 
of each step increases. A consequence of the theory of dynamic programming leads to the 
Hamilton-Jacobi-Bellman Equation. 

Hamilton-Jacobi-Bellman Equation 


F[x(t), t] = min 


rt+At 


x(f) = f[x(t),u(t),t] 
T[x(t), u (t),t]dt + V\x! (t), t + At] 


L •H 


(1) 

( 2 ) 


H[x(t),t] is the cost to go to state at time t in the system. 

E[x'(f),f + At] is the penalty or the cost to go to the next state after time At. 

L[x(t), u(t), t] is a running cost that accrues as control is utilized to move from one state 
to another. 


x'(t) = x(t + At) 


The next step is to be able to solve this cost function V from one state to another over time, 
and to minimize the control u(t) over that time interval. 

We can add and subtract P[[x(t),t]] on the right-hand-side of the equation. 


P[x(f), t] = min 


rt+At 

Jt 


L[x(t),u(t),t]dt + V[x.'(t),t + At] — P[x(t), t] + P[x(t), t] 


(3) 


dV = V[x'(t),t + At] — H[x(t), t] (4) 


P[x(t),t] 


mm 


rt+At 


L[x(t), u(t), t]dt + dV + E[x(t), t] 


(5) 


Next we can utilize Taylor Polynomials to create an expression for dV that can be substituted 
into Equation ([5]). At this point the notation expressing x or u as a function of time will be 
dropped: x(t) = x. 


V[x' , f + At] = 


V[x' — x + x, t + At] = V[x + Sx, t + At] 
r 1 dV (. T r r p (. 1 

y [x, t] + —5t + V x Sx + - 


5x 

T 

Vxx 

V xt 


5x 

_5t 


V tx 

Vu 


St 


( 6 ) 

(7) 


With this formulation of dV utilize the fact 5x = /[x, u, t]St and the fact that higher order 
terms go to zero as they are very small. Substituting this in Equation (0) gives 
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V[x',t + At] = V[x,t] + ^St + V x f[x,u,t]St + ^ 


f[x,U,t]St 

T 

Vxx 

V xt 


f[x,u,t\5t 

6t 


v tx 

Vtt 


8t 


f [x, u, t\5t 

T 

Vxx 

V xt 


/[x,u,f](ft 

St 


V tx 

Vtt . 


St 


( 8 ) 

= 0 (9) 


Equation @ is true because higher order terms go to zero. The factor of 5t in each term of 
the matrices ensures that there will be a 5t 2 in every piece of Equation d9j) . Next, bringing over 
over to the left-hand-side will give: 


V[x,t] = min 

u 


rt+At 

Jt 


dV 

dV = —5t + Vjf[x,u,t]5t 


dV 


L[x, u), t]dt + + Ej/[x, u, t]5t + V[x, t] 


( 10 ) 

( 11 ) 


Cancelling out V[x.(t),t] on both sides and applying L[x, u, t\dt = L[x,u,t]5t allows 

5t to be divided out of the equation. Finally move ^ to the left hand side and the result is 
Equation (fT5l) . 


0 = 


dV_ 

~dt 


mm 

U 


mm 

u 


mm 


mm 


J L[x,u,t]dt +—St + V x f[x,u,t}8t 
dV 

L[x,u,t]St + -7— + Vj f[x, u, t]8t 
dV 

£[x, u > *] + -Qj: + Ej/[x, u, t] 


L[x, u, t] + Vf/[x,u,i] 


( 12 ) 

(13) 

(14) 

(15) 


Thus Equation (1151) gives the Hamilton-Jacobi-Bellman Equation. This equation provides a 
basis to solve for a cost to go from one end of the state to another based on the running cost L 
and the terminal cost at the final state. 

Now that we have the equation, it is necessary to determine the boundary condition for this 
non-linear partial differential equation. If we are trying to move from state B to state C, the 
total cost of going to state C will be the terminal cost at C, plus any sort of running cost that 
builds up as we input control into the system. This is shown in Equation (1161) . 


V[x(t B ),t B ] = min 



L[x(t),u(t),t]dt + V[x.'(t c ),tc\ 


(16) 


As a result, in the HJB equation, we can solve for the value function utilizing the ’’cost to go” 
or the terminal cost of state C as a boundary condition. This is where the terminal boundary 
condition is utilized in dynamic programming. Once the value function has been solved over 
time, we can find the ’’cost to go” required to be at state B. Now that this is known, we can 
repeat the process for a prior state movement from A to B. 
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1.2 The Maximum Principle 

Pontryagin’s Maximum Principle can be stated as necessary conditions for optimality through 
the use of calculus of variations. An optimal control history will cause the cost function to 
remain stationary with respect to the control function within the constraint of the system 
dynamics and initial state [3]. 

This approach is general and does not require the specification of the shape of the control 
function before calculation, unlike parametric optimization. The controls can even include 
piecewise continuous functions. 

A general non-linear, time-varying dynamic system can be defined with the following equation. 
And then the derivative of the state can be brought to the right hand side of the equation 
creating an equality constraint that is represented by the state dynamics. When this equation 
is satisfied, the equality constraint is zero. 


*(*) = f[x(t),u(t),t] (17) 

f[x(*),u(t),t] -x(t) = 0 (18) 

Now, the constraint can be adjoined to the cost function in order to create a single equation 
that can be minimized with respect to the control. Equation (1181) can be combined into the 
cost function J via a vector of Lagrange multipliers. 


J = cj)[x(tf),tf}+ f £[[x(f),u(f),f] + A r [f[x(f),u(f),f] - x(t)]]dt (19) 

Jt 0 

• £ is the Lagrangian of the system, the term represents the energy required to change the 
state from the initial to the final. 

• cf) is the penalty or the cost at the final state. 

• A is the vector of Lagrange multipliers that adjoins the dynamic constraint to the cost 
function. 

The next step is to utilize the Hamiltonian, T~L to simplify the equations to manipulate them 
further via integration by parts. 


'H[x(t), u (t), A = C[x(t), u(f), t] + A T f [x(f), u(t),t] (20) 

J = (f[x(tf),tf}+ f [ft[x(t),u(t),A(t),t] - X T ±(t)}dt (21) 

Jto 

J = (22) 

(f)[x(t f ),tf\ + [A T (f o )x(f 0 ) - A T (t/)x(t/)] 

+ f [H[x(t),u(t), X(t),t] + \ T (t)x(t)]dt 
Jt 0 

After using the Hamiltonian, the stationarity of the cost function can be defined as variation 
of the cost function, A J, being zero by any first order effect of control variation, Au. This is 
similar in taking a Taylor series to the first order terms of J, shown in equation (1231) 
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(23) 


A J 


dJ dJ . 

-Au + -Ax(Au) 


With a fixed final time, the first variation due to control is 


A J = 


d (j) 
dx 


— A r ]Ax(Au) 


+ A t Ax(Au) 


t=tf 


, rt * t &H A <dU - TlA , 
+ / f—Au+[--bA r ]Ax(Au) 


t=to 


'to 


du 


dx 


dt (24) 


A J = A J{t f ) + A J(t 0 ) + A J(tO, tf ) (25) 

These 3 parts must equal zero separately for the control to be optimal. Luckily, we can assume 
that the control has no effect on the initial condition of the cost, so AJ(fo) = 0. This leads to 
the next two equations that provide the basis for the necessary conditions of local optimality. 

A (t) should satisfy the differential equation with the additional final condition. 

(26) 
(27) 

equation to solve for the control 


A T (t) = - 


dU 

dx 


A T (tf) = 


d<p 

dx 


t=t f 


Now that A has been identified, the next equation gives the 
function 



(28) 


Equations m, m and (1281) are called the Euler-Lagrange Equations. These are the 
necessary conditions when the Hamiltonian is stationary along the optimal trajectory. They 
are local conditions because they do not imply that there may be other optimal paths. 

Because these equations are coupled, we may require a simpler method of solving them 
separately. One approach is to relate the Euler Lagrange Equations to the original cost function 
and the dynamics. F and G are the Jacobian matrices, and 


dC J 


6m T _ 

dx dx 

dU T _ dC T 

du du 


+ F 1 A 


+ G t A 


(29) 

(30) 


The adjoint vector can be obtained by substituting into the dynamic equation with the final 
condition. 


A it) 


-F T (t)X(t) 


dC T 

dx 


A T (t f ) 


dcj> 


dx 


t=tf 


(31) 

(32) 
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Finally, the optimal control history u can be calculated by solving the equation 

r)C T 

+ G T {t)\(t ) = 0 (33) 

Once again, this process utilizes the state dynamics as a constraint, a cost function based off 
of 4> and C, and calculus of variations to determine an optimal control function that will keep 
the energy required to drive the system from the initial state to the final state to a minimum. 
These are necessary conditions for optimality, but not sufficient conditions as the stationarity 
of the Hamiltonian may not necessarily be an optimal path. 

In order to guarantee that the solution is a minimum for the optimal control problem, sufficient 
conditions must be introduced. This section is primarily based off of the first of the three tests 
discussed in [3] . The test is the strengthened Legendre-Clebsch condition, which is shown below 
in equation llMl) . 


|^[x*,u*,A*,f] =U uu >0 (34) 

This equation can be interpreted that the Hamiltonian function is concave with respect to 
the control function. As a result, the stationary point that represents the zero variation of the 
Hamiltonian with respect to the control function is a minimum point, assuring optimality of 
the control function. This matrix is the Hessian matrix of the Hamiltonian with respect to the 
control. This sufficient condition is again only a local condition for the respective time period, 
globally the Hamiltonian may not always be concave. 

1.3 Simulated Systems 

Both of these optimal control frameworks are implemented on three canonical test systems for 
a variety of conditions. The first system these frameworks are simulated on is a cart-pole with 4 
states and a single control. This under-actuated simulation a popular benchmark for trajectory 
optimization algorithms. The second test system is the double cart pole, which is a system with 
6 states and still a single control. The ability for an optimal control algorithm to be able to 
optimize such a system is quite powerful. The final system is a quadrotor with 12 states and 4 
controls. This is the beginning of a series of simulations that will eventually lead to a real-time 
implementation on a flying quadrotor. 

2 Differential Dynamic Programming 

Now we can apply the Dynamic Programming Principle to an algorithm for Differential Dynamic 
Programming. This algorithm is part of class of methods known as ’’shooting” direct methods. 
This is where the optimality conditions are not directly computed, and many trajectories are 
tested in order to find a minimum cost. Over each iteration, the trajectories are improved, 
leading to better cost. 

2.1 Equations 

Next we derive the basic equations required for the backward pass in Differential Dynamic 
Programming. 


6 



We first begin with the cost function that must be minimized V(x(to),to)- 


V(x(t 0 ),t 0 ) 


mm 


rtf 


^( x (*/).*/) + J C[x(t),u(t),t]dt 

Cost to go ^ v- y 

Running Cost 


( 35 ) 


This cost is subject to the dynamics of the system given an initial condition or a nominal 
trajectory (x, u, the algorithm of DDP must have a baseline to start from). 


x(f) = F [x(t),u(f),t] (36) 

From my understanding, now we want to linearize the dynamics in order to produce an 
equation that will show the change with respect to a 5x in terms of discrete time. This is 
required to generate a programmable algorithm that works in discrete time steps. 


x.(t) = F[x + <5x, u + <5u, t] = F[x, u, t] + V x F5x + V u F5u (37) 

6x = x.(t) — F[x, u, t] = V x Fd'x + V u F<5u (38) 

^^ 

Differential Dynamics 

Sx = ^ x(4+l) ~ ^ X(4) = V X F Sx(t k ) + V u F5u(t k ) 
at 

5x(t k+ i) = Sx(t k ) + V X F Sx(t k )dt + V u F5u(t k )dt 
6x(t k+ 1 ) 

— (Inxn V X F dt) Sx(t k ) + (S7 u Fdt) Su(t k ) 

' -V-' S -v-" 

The completed linearized dynamics (1391) along with the Bellman Dynamic Programming (1401) 
Principle in discrete time gives the basis for Differential Dynamic Programming. 


5x(t k+ 1 ) = &(t k )5x(t k ) + B(t k )Su(t k ) (39) 


V[x(t k ),t k ] 


min 

u 


C[x(t k ),u(t k ),t k ]dt + V[x(t k+ i),t k+1 ] 


(40) 


In our algorithm, the backwards pass is the one utilized to solve for the optimal control input. 
The backwards pass implies we must start from the desired state, and because Equation (14U1) 
contains that information in the form of V[x(t k+ i),t k+ i], the next step is to expand the value 
function V out to the second order. By expanding this equation out, we can create a quadratic 
form for our value function with equations for V, V x , and V xx . After doing so we can solve for 
our optimal control. 
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Expand E(x, 4 + 1 ) to the second order around the nominal trajectory 


E(x(4 + i,4 + i)) = V x(4 + i) -x(4 + i +x(4 +1 ),4 + i 


= V x(4 + i) + <5 x(4 + i), 4 + i 


= Vi x(4+i),4+i) + 


14(4+i) t M4+i) + -5x(4 + i) t E xx (4 + i)5x(4 + i) (41) 


In Equation (141 1) . the following three terms are true. 

<fx(4 + i) = x(4 +1 ) - x(4+i) 

^4 (4+i ) = V X E (x, 4 + 1 ) 


X = X£ 


l xx(4+i) — V xx v (x, 4 + 1 ) 


k +1 


*fc+l 


(42) 

(43) 

(44) 


Now we will substitute our equation for <5x, Equation (1391) into the expansion for V, Equation 
(1411) . All values of x in the expansion are for the nominal value x. 


E(x(4+i,4+i)) = E(4 +1 ) + E x (4+i) t ^<!>(4)<5x( 4) + B(4)<5u(4)^ + 

^ ^(4)<5x(4) + B(4)<5u(4)^ E xx (4+i) ^(4)<5x(4) + B(4)5u(4)^ 

= E(4+i) + 14(4+i) i ^(4)<^x(4) + E x (4+i) t B(4)<5u(4) + 

^x(4) t $(4) t E xx (4 +1 )$(4)5x(4) + 

i(fu(4) r B(4) T E xx (4+i)B(4)5u(4) + 

^u(4) t B(4) t E xx (4+i)^(4)5x(4) + 

^4x(4) i $(4 ) t E xx (4 + i)B(4)5u(4) 


(45) 




Expand £(x, u, 4) to the second order around the nominal trajectory 

This local quadratic expansion of C is very similar to the previous expansion of V. In this case, 
we have to take the derivatives of both x and u. 


£[ x(4),u(4),4 )dt = 


£ (x(t k ) + Sx(t k ), u(4) + <5u(4), 4^ dt 

£(x(4), u (t k ),t k )dt + (V x £(x(t fc ), u (t k ),t k )dt) T 6x(t k ) + 

ittk) q (t k ) T 

(V u £(x(4), u (t k ),t k )dt) T Su(t k ) + 

'-v-' 

r(tfe) T 

x<^x(4) t V xx £(x(4), u (t k ),t k )dt 5x(t k ) + 

1 '-V-' 

Q(*k) 

-5u(4) T V uu £(x(4), u (t k ),t k )dt <5u(4) + 

1 '-v-' 

R-(ifc) 

-5u(4) T V xu £(x(4), u(4), t k )dt Sx(t k ) + 

1 ' - v -' 

N(t fc ) 

-5x(4) t V ux £(x(4),u(4),4)dt (5u(4) (46) 

---- 


£ (x(tjc), u{t*), tjc) d* = q(t k ) + q(tfe) r dx(tfe) + r(t fc ) T du (t k ) + 

^<Sx(4) t Q(4)<Sx(4) + ^u(4) t R(4)(5u(4) + 
^u(4) t N(4)<5x(4) + ^x(4) t M(4)<5u(4) (47) 


Combine the expansions of C and V 

Now we can combine Equations (1451) and (1471) in order to simplify our equations for the value 
function V. 


C[x(4),4] = 


mm 

Su(t k ) 


£[x(4), u(4), t k \dt + E[x(4+i), 4+i 


Equation G3 


Equation fist 


1 


q(t k ) + q(4) T 5x(4) + r(4) T <5u(4) + -dx(t k ) T Q(t k )5x(t k ) + 


mm 

Su(t k ) 

t^u(4) t R(4)<5u(4) + ^5u(4) t N(4)<5x(4) + ^<5x(4) T M(4)<5u(4) + 

E(4+i) + V x (t k+1 ) T &(t k )Sx(t k ) + E x (4 + i) t B(4)<5u(4) + 
^x(4) r $(4) T E xx (4+i)$(4)(5x(4) + )Uu(4) T B(4) r C xx (4 +1 )B(4)<5u(4) + 


5u(4) T B(4) T V xx (4 + i)$(4)<ix(4) 


(48) 
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Now we can take Equation (1481) and set the isolate the terms that contain 5u(t k ) T . This will 
require some rearrangement of some terms such as r(4), but in the end, we will be able to solve 
for Su(t k ) because all the terms containing it must sum to zero. 

0 = r (4) + ^R(4)5u(4) + (4)<5x(4) + ^M(4)<5u(4) + B(t k ) T V^{t k+1 ) + 

B(4) T E xx (4+i)B(4)5u(t fc ) + B(4) T E xx (4 + i)$(4)hx(4) 

Now lets rearrange to put like terms together. 

0 = ^r(f fc ) + B(t k ) T V x (t k+ i) \ + Qn (t k ) + (t k ) T + B(t k ) T V xyi (t k+1 )&(t k )\ Sx(t k ) 

'-v-' N --V- 

g G 

+ \ + b (4) t K; X (4+i)B(4)^ <5u(4) 

N-- 

H 


0 = g + G<5 x( 4) + ^H6u(4) 

-^H<5u(4) = g + G &x(4) 

5u(t k ) = -2H- 1 g + -2H- 1 G 6x(4) 

"•-v- ' s -v- ' 

1(4) L (4) 

hu(t fc ) = 1(4) + L(4)hx(4) (49) 

Now we can take Equation (TT9T) and substitute for 5u(t k ) in our expansion of V in Equation 

(USD- 


E[x(4), 4] = q(t k ) + q(4) T <5x(4) + r(t k ) T (l{t k ) + L(t k )6x(t k ) \ + ^8x(t k ) T Q(t k )5x(t k ) + 

\ ^1(4) + L(4)i5x( 4)^ R(4) ^1(4) + L(4)hx(4)^ + 

\ ^1 (4) + L(4)(5x(4)^ N(4)5x(4) + ^5x(4) T M(4) fl(4) + L(4)hx(4)^ + 
V (4+i) +14 (4+i ) 7 $(4)5x( 4) + E x (4+i) i B(4) ^1(4) + L(4)<5x(4)^ + 
i(5x(4) T ^(4) T 14 x (4+i)^(4)dx(4) + 

^^1(4) + L(4)(5x(4)j B(4) t E xx (4 + i)B(4) fl(4) + L(4)<Sx(4)^ + 

(l(t k ) + L(4)hx(4)^ B(4) T E xx (4 + i)$(4)hx(4) (50) 
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Split into Zero, First, and Second Order Groupings 

For the final step of this process, each term in Equation (1501) can be separated into terms that 
are zero, first, and second order with 5x. This step allows us to derive the Ricatti Equations 
for V(t k ), 14(4), and 14x(4)- 

For each of the equations, 5x is then set to zero, hx = 0 because V signifies a minimum, 
and at the minimum the variation in value with respect to changes in state should be zero. We 
already reflected the variation with respect to the control when we solved for 6u(t k ) in Equation 

m- 

2.1.1 Zero Order Terms 


= g(t k ) + r(4) T l(4) + ^1(4 ) t R(4)1(4) + V(t k+1 ) + V4(4+i) t B(4)1(4) (51) 

+^ 1 (4) t B(4) t 14 x (4+i)B(4)1(4) 

= q(t k ) + V ( 4+1 ) + fr(4) T + 14(4 +i) t B(4)^ 1(4) + 

-V- 

g (tk) T 

^1(4) t ^R(4) + B(4) t 14 x (4+i)B(4)^ 1(4) 

--v-^ 

H (t fc ) 

= «(4) + ^( 4 + 1 ) +g(4) T l(4) + ^1(4 ) t H(4)1(4) (52) 


ll 




First Order Terms 


= q(4) T <5x(4) + r(4) T L(4)<5x(4) + l(4) T R(4)L(4)(5x(4) + ^l(4) T N(4)£x(4) + 

^1(4)M(4) t <5x(4) + 14(4+i) T $(4)<5x(4) + F x (4+i) t B(4)L(4)<5x(4) + 
l(4) T B(4) T y xx (4 + i)B(4)L(4)£x(4) + 1(4) t B(4) t F xx (4+i)$(4)<5x(4) (53) 

= (4) T + 14(4+i) T $(4)^x(4 ) + (^r(4 ) T + 14(4+i) T B(4 )^ L (t k )6x(t k ) + 

g{tk) T 

l{t k ) T ^R (4) + B(4) t F xx (4+i)B(4)^ L(4)5x(4) + 

H(t fc ) 

!(4) t Qn(4) + ^M(4) t + B(4) t F xx (4+i)$(4)^ <5x(4) 

s --- 

G(t fc ) 

= ^q(4) T + 14(4 +i) t $(4)^ <5x( 4) + g(4) T L(4)<5x(4) + 

1(4) t H(4)L(4)<5x(4) + 1(4) t G(4)(5x(4) (54) 
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Second Order Terms 


= ^<5x(4) t Q(4)5x(4) + ^x(4) t L(4) t R(4)L(4)5x(4) + 

^x(4) t L(4) t N(4)<5x(4) + ^<5x(4) t M(4)L(4)<5x(4) + 
^x(4) t $(4) t T4x(4+i)^(4)<5x(4) + 
^<5x(4) t L(4) t B(4) t 14 x (4 + 1 )B(4)L(4)5x(4) + 

5x(4) T L(4) T B(4) T y xx (4+i)^(4)^x(4) (55) 

= ^<5x(4) t Q(4)5x(4) + 

^x(4) t L(4) t ^R(4) + B(4) t F xx (4 + i)B( 4) j L(4)<5x(4) + 

'-V-' 

H (t fc ) 

^x(4) t L(4) t ^N( 4) + B(4) T y xx (4 + i)$(4)^(5x(4) + 

^x(4) t ^M(4) + $(4)F xx (4+i)B( 4) t ^ L(4)5 + x(4) 

^x(4) t $(4) t T4 x (4 + i)^(4)<5x(4 (56) 

Riccati Equations 

After the terms are separated by order, derivatives can be taken in <5x in order to get the 
respective Riccati Equations, shown below. 

Vxx(tk) = Q(4) + L(4) t H(4)L(4)+ L(4) t ^N(4) + B(4) t E xx (4 + i)^(4)'|l(4) 

L(4) t ^M( 4) + $(4)E xx (4 + i)B(4) r ^L(4) + $(4) T 14x(4+i)$(4) 

14(4) = q (4) T + 14(4+i) t ®(tk) + g(4) T L(4) + 1(4) T H(4)L(4) + 1(4) T G(4) 

v(t k ) = q(t k ) + V(t k+1 ) + g(4) r l(4) + ^1(4) T H(4)1(4) 

In the following section we will review the equations and methods for Gauss Pseudospectral 
Optimal Control, then compare the results of both algorithms. 
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3 Gaussian Pseudospectral Optimal Control 


Pseudospectral optimal control methods are a significant deviation from the classical methods 
for optimal control. Typically, classical ’’indirect” methods are known to analytically solve for 
first and second order optimality conditions. Then a cost function in optimized by solving for 
a control that satisfies the dynamic constraints and yields the lowest cost. Unfortunately, find 
these analytical solutions and evaluating the optimality of each solution can be difficult. 

On the other hand, Pseudospectral optimal control is in a class of algorithms known as 
’’direct” methods. The general outline (from Huntington) is to do the following: First, We 
will convert the dynamic system into a problem with a finite set of variable and transform 
the the constraints to ensure they are simply algebraic constraints. This will allow us to 
solve with a Non-linear Program (NLP) Solver. Second, We will solve this finite dimensional 
problem with the NLP and optimize the parameters with which we approximated the original 
system. Third, we can assess the accuracy of our approximation utilizing the costate and the 
Karush-Kuhn-'Tucker (KKT) conditions. The advantage of this particular method (for smooth 
problems) is that it exhibits a fast convergence rate. The Gauss Pseudospectral Method is 
described in much further detail in [Q. 

3.1 Equations 

Let us begin with a few equations to describe the problem. The Pseudospectral methods are 
defined with the Transformed Continuous Bolza problem. 


J = $(x(T 0 ),t 0 ,x(T/),t/) + tf 2 t0 j 9(x-(T),u(T),T\t 0 ,tf)dT (57) 

This cost will be minimized subject to the following constraints. 

= — 2 * 0 f ( x ( r )> u ( r )» r ;*o,*/) 

</>(x(r 0 ),f 0 , x (T/),t/) = 0 
c ( x (t), u(t),t; to,tf) < 0 

2 t tf -\- to 

T = --- 

tf — to tf — to 

Equation (T58li is the dynamic constraint to the problem. Equation (l59l) is the boundary 
condition at the final state. Equation (1SU1) is the path constraint for the problem. Equation (RJTT) 
shows the time transformation required for this derivation of pseudospectral optimal control. 
Typically, we require a fixed time interval such as [—1,1], thus such a transformation can be 
used that is still valid with free initial and final times to and tf. 

Collocation Point Search 

One of the most important aspects of this method is the choice of collocation points K. 
These points are where we set the approximation equal to the function across the interval 
and provided the set for our approximation. While there are a variety of points to choose from, 
the three most accurate choices are Legendre-Gauss (LG), Legendre-Gauss-Radau (LGR), and 


(58) 

(59) 

(60) 

(61) 
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Legendre-Gauss-Lobatto (LGL) Points. Below is the equation for the Legendre-Gauss points. 
The points are defined as the roots of Equation (1621) . Equation (1621) is the I\ th degree Legendre 
polynomial. 


-Pa-(t) 2 K K\dT K ^ T2 ^ ^ 

The reason for the points to be appropriated as the roots of this polynomial is because of 
accuracy of the approximation. Equally spaced points will not accurately represent the function 
near the boundaries, hence this distribution actually clusters the points closer to the boundaries 
of [—1,1]. This is another reason of why we need to have a finite boundary for this method (in 
the transformed space). 

State, Control, and Costate Approximation 

The next step of this is to define how we will approximate the state, control, and the costate. 
Note that we do not have to use the same approximation for each one (as in, we do not have 
to use the same collocation points and basis functions for each state, control, and costate), but 
we do use the same process. From Huntington, ’’Let us assume there are K collocation points” 
(as defined above) and use these points to estimate both the state and control. 


K 


x(r) » X(r) = A(r)X(ri) 

i =1 

(63) 

AM- n T - Ti = , 3{T \.(r) 

(64) 

g{r) = (1 + r)P fc (r) 

(65) 


Equation flTTTTl) represents the state approximation in terms of a set of Lagrange interpolating 
polynomials. Equation (1641) represents the actual polynomials, where the Pk is the Legendre 
Polynomial. 


Integral Approximation via Gauss Quadrature 

Within the dynamic constraints and cost function we also have integrals that must be approximated. 
One way to do this is via a Gauss quadrature. The generic form for this is in Equation (1661) . 


r b K 

/ f(r)dT^^2wif(Ti) 
Ja i =l 


( 66 ) 


The points are the quadrature points on the interval [—1,1] and the weights Wi are the 
quadrature weights. Equation (1671) are for finding these weights. 


Wi= Ci{T)dT = - - , (i = L ..., K) 

J -1 (1 ~ Tt)[Pk(Ti)} 2 


(67) 
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Algrebriac Representation of Dynamics 

The next step is to convert the differential dynamic equation constraint on the optimization 
problem into an algebraic constraint. This is done in Equation ([68]) 


K K 

x{r k ) « X(r fc ) = Y, = Y D kiMr k ), (k = 1, K) (68) 

i =1 i=l 

The differentiation matrix which is given in Equation (1691) . 




(l+T k )P K {r k )+P K {Tk) 

(■ T k -Ti ) [ (1+Tj )Pk (n ) +Pk (t» )] 


(1 + Ti)P K (Ti)+2P K (Ti) 

- 2 [(1+Ti)Pff(T<)+Pif (t»)] 


if k^i 

if k = i 


(69) 


We can now implement this algebraic representation in terms of a residual function by 
equation the derivative of our approximation to the vector field. 


Rfc = Y D ki^ij k ) - tf - to f(X(Tfc),U(rfc),r fc ;fo,t/) = 0 ,(k = 1, ...,X) (70) 

Z=1 


Gauss Pseudospectral Discretization of the Continuous Bolza Problem 

Now we can put everything together and define our new problem. We want to minimize the 
following cost function Equation (f7TT) subject to the algebraic collocation constraints in Equation 
(1721) . We also have the quadrature constraint in Equation (1731) , boundary constraint in Equation 
flUD and path constraint in Equation (f75l) . 


J — 3>(X 0 , tO, ^ 2 ° w k9^kt Ufc,7fc; to, tf) (71) 

k =i 

K t -t 

Rfc = ^DfcjXj- ° f(Xfc,Ufc, r fc;to,t/) = 0 (k = 1,... ,N) (72) 

i=0 

R/ = X/ — X 0 - f 2 ° ^^fcf(^aUfc,Tfc;t°,t/) = 0 (73) 

fe=i 

0(X o ,to,X/,t/) = 0 (74) 

C(X fc , U fc , r k ; t 0 , t/) < 0 (75) 

The procedure to solve is as follows. 


1. Choose the initial and final times, then choose your basis functions and use the Legendre 
polynomials to get the collocation points. 

2. Translate the dynamic system into the the discretized and approximated version of the 
Continuous Bolza Problem. Also translate the constraints into the equivalent forms. 
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3. Solve the finite-dimensional problem with a Non-linear Program Solver. 

4. Check the accuracy of the problem by the costate approximation (not listed here), and 
then run again with different conditions, moving towards the optimum. 

The program utilized to implement this method is called GPOPS II [2] .The non-linear 
program was solved using ipopt. 


4 Results 

We applied both algorithms to three systems. The first was a simple cart pole, the second was a 
double cart pole, and the final system was a quadrotor. Tabled] below contains the parameters 
and cost functions for each problem. The parameters and cost functions were kept consistent 
between both algorithms. We will observe the behavior of each algorithm, as well as the final 
cost and total runtime for each algorithm. Note that is not possible to directly compare the 
run time as DDP is currently implemented in MATLAB, while GPOC has a few files that 
run through C++. Nevertheless, a few generic observations can be made about algorithm 
speed. The cost comparison is only approximate, as DDP is a discrete time optimizer (i.e. the 
dynamics are discretized and linearized locally at each timestep) and GPOC is a continuous 
time optimizer. With a small timestep dt for DDP, the costs are comparable. 


Table 1: Problem Settings 


Problem 

System Parameters 

Final Time 

Cost Function 

Cart Pole 

Cart Mass = 1 kg 

Link Mass = 5 kg 

Link Length = 1.5 m 

2 seconds 


Double 

Cart Pole 

Cart Mass = 3 kg 

Link 1 Mass = 1 kg 

Link 2 Mass = 20 kg 
Link 1 Length = 1.5 m 
Link 2 Length = 1.5 m 

4 seconds 

fto ( Ru2 ) dt 

Quadrotor 

Quadrotor Mass = 1 kg 
J x = 8.1E-3 

J x = 8.1E-3 

J x = 14.2E-3 

Arm Length = .24 m 

3 seconds 

fto (x t Qx + u T Ru)dt 


4.1 Cart Pole 

The first system is the classic cart pole. The task is to swing up the cart pole from a downwards 
stable equilibrium to an unstable upwards equilibrium. In Figure [TJ the comparison between 
the Differential Dynamic Programming and Gauss Pseudospectral method can be observed. 
The left column corresponds to DDP, while the right column corresponds to GPOC. In this 
first comparison, DDP ended with a final cost of 0.756, while GPOC ended with a final cost of 
0.1257. The runtime for DDP was 7.5 seconds, while GPOC ran in 0.9 seconds. In Figure [Tal 
we can see that the cart moved to the left and right of the origin far more than for in Figure flbl 
This behavior is characteristic of GPOC, it often decides that a higher control effort is more 
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optimal in order to reach the target. The observation of the angular rate bounds can also be 
observed is Figures fTcl and fldl 
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Figure 1: Cart Pole State Comparison 
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Figure 2: Cart Pole Control Comparison 
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Figure 3: DDP Cart Pole Cost 

We can view in Figure flcl and flfl the difference between the control effort for both algorithms. 
Interestingly enough, the pseudospectral algorithm came up with a lower final cost, even though 
it used higher maximum and minimum control. This observation is also related to the chosen 
masses for the Cart and Link. The link mass is much larger than the cart mass, hence creating 
a larger torque in the link by pushing the cart provides more power to swing up, rather than 
slowly pushing the cart side to side. 

We can see in Figure [3] the cost for DDP quickly converges by iteration 30. This will not 
always be the case, more complex systems, such as the Double Cart Pole will show spikes in 
the cost as DDP exhibits exploration into new control policies. 

4.2 Double Cart Pole 

The double cart pole is a highly under actuated system in which six states are controlled by a 
single control. It is similar to the cart pole system, however, there are now two links that must 
be controlled with just pushing the cart. Once again the task is to swing up both links from the 
stable equilibrium to the unstable equilibrium. Differential Dynamic Programming results in a 
feedforward, as well as a feedback gain, which proves key to solving this system. In order words, 
the system can apply full state feedback in order to reach the optimal trajectory. However, the 
Gauss Pseudospectral method has no such feedback term in its optimal control policy. As it 
result it is entirely feed forward exasperating the difficulty of the under actuated problem. This 
problem difficulty appears to manifest itself in the system runtime, GPOC takes 206 seconds 
to reach an error tolerance below IE-6. DDP on the other hand can solve the problem in 59 
seconds. 

We can observe the optimal solution in Figured) The is a distinction in the trajectories that 
both methods provide. DDP take the entirety of the time horizon, all 4 seconds, to swing the 
two links up into the vertical position. We can see that in Figure I4bl the system manages to 
swing up in less time, and then stays near the origin for the remaining time. Once again, GPOC 
stays keeps the cart closer to the origin at the expense of higher maximum and minimum control 
effort. 


19 








(a) 


DDP Double Cart Pole: Heavy Link 2 
Pendulum 1 State 



time (s) 



time (s) 


(b) 

DDP Double Cart Pole: Heavy Link 2 


Pendulum 2 State 



time (s) 



time (s) 


(c) 
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Figure 4: Double Cart Pole State Comparison 
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Figure 5: Double Cart Pole Control Comparison 
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Figure 6: DDP Double Cart Pole Cost 

Analyzing Figure 0 we can see the control effort differential manifest itself. Further enforcing 
GPOC’s generic behavior, the control levels off to zero after 2.7 seconds, which is when the cart 
is almost to the origin and the links are almost vertical. The final cost of the DDP algorithm 
was .3632, while the final cost of the GPOC algorithm was 0.187. Even though the optimization 
took longer, the final solution was far more optimal, as long as control remains cheap. 

We can see the oscillations in the cost for DDP in Figure [6l As mentioned previously, DDP 
has the ability to explore for new policies given a particular quadratic structure of the cost 
function. Ultimately, the cost converges to a much more optimal minimum. 

4.3 Quadrotor 

Finally we come to the quadrotor system. The dynamical system has 12 states corresponding 
to position velocity, orientation, and angular rates. There are now four controls in this system 
as opposed to singular control in the other systems, these four controls correspond to the thrust 
of each rotor. The initial position was [—1,1, .5] and the target state was [.5, —1,1.5]. The cost 
function now has an additional term which penalizes excessive orientation angles, such as the 
roll angle being 90 degrees. 
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Figure 7: Quadrotor State Comparison 
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DDP Quadrotor Control Effort 
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Figure 8: Quadrotor Control Comparison 



Figure 9: DDP Quadrotor Pole Cost 

We can observe the same patterns between the two algorithms. GPOC utilizes more force, 
but reaches the target faster, while DDP takes the entire time horizon. Note that DDP was 
extremely sensitive to the discrete time step size, as well as the choice of weights in the cost 
function. DDP converged with a cost of 44.6182 with a runtime of 41 seconds. GPOC converged 
with a cost of 47.9891 and with a runtime of 18 seconds. Once again GPOC is faster, and notice 
in Figure [3 DDP did not converge precisely to the desired target state, thus GPOC delivered 
a more optimal solution. This time we see that angle, velocity, and angular rate were higher 
with GPOC as opposed to DDP. This is likely related to the increased amount of control seen 
in Figure El but also to how sensitive GPOC and DDP are to weights in the cost function. For 
GPOC, the final state must exactly match the bounds described in the problem, while for DDP, 
the optimal trajectory need not necessary reach the final target exactly, again depending on how 
heavily the weights are set. This is a fundamental difference in the behavior of each algorithm. 
Finally, we can observe that this rotor configuration is standard and does not reflect on how 
well each algorithm withstands crazy configurations, such as large rotor to CoM lengths, or 
non-symmetric inertias. This was explored partially by the previous experiments, as the carts 
were made to be far lighter than the links, to increase the difficulty of optimization. 
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5 Future Work 


These simulations are just the beginning for the analysis of these two trajectory optimization 
techniques. Ultimately the goal is to apply both methods into an ROS simulation of the 
Barret Technologies WAM arm and our quadrotor. In order to do this, the Discrete Time 
DDP Implementation must be completed in C++, and it would also be best to analyze the 
performance of the continuous time formulation against GPOC. In addition, we want to extend 
the capabilities of these algorithms, namely into dynamics with contact and stochasticity. 
Currently, I am studying methods to propagate uncertainty through polynomial chaos, as well 
as methods to solve a linear complementary problem with DDP to incorporate discontinuity. 
Finally, both methods can also be studied with a receding horizon framework, to assess their 
ability to be applied to real systems. DDP has the advantage of feedback gains pushing the 
system towards the optimal trajectory, GPOC has the ability to divide the trajectory into small 
bits, to avoid the need for feedback by optimizing more often and faster. There is much to be 
done in the future, but the possibilities are promising. 
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